Distributions and non-parametric (distribution-free) tests

(Caveat: This list is sort of casual and may have some slight errors, especially in the descriptions of the non-parametric tests.)

COMMONLY USED DISTRIBUTIONS

binomial distribution

- distribution of possible event outcomes for varying numbers of trials with two complementary probabilities for occurrence and non-occurrence of the event, p and q (= 1 - p), such as for a coin flip where heads p = .5 and q = .5

- for example, the probability of flipping a coin three times and getting two heads and one tail

- includes other possible probabilities, for instance p = .75 and q = .25

- Bernoulli distribution is binomial distribution when number of trials = 1 but this doesn't come up much in statistics for data analysis

normal distribution

- the limit of the binomial distribution when the number of trials becomes infinite (as long as p isn't 0 or 1)

- describes outcomes of multiple causal factors that tend to cancel each other's effects yielding many scores near the mean, but less frequently work together in the same direction to make scores higher or lower yielding scores in the positive or negative tails

- gives the familiar bell-shaped curve for any mean μ and standard deviation σ (variance σ²)

- has no skewness (value of 0) or kurtosis (value of 3 means neither leptokurtic nor platykurtic)

standard normal distribution (z)

- normal distribution with μ = 0 and σ (and σ²) = 1

- describes distances of scores Y from the population mean μ in units of the population standard deviation σ, because z = (Y - μ)/σ (where μ and σ could include the subscript Y as in μ_Y and σ_Y, but that's usually assumed)

- also describes distances of sample means M from the population mean μ in units of the population standard error of the mean (σ_M), because z = (M - μ_M)/σ_M

- μ_M is the mean of the sample means, ie the mean of the set of means of all the possible samples of a given size that could be taken from a population; it's equal to μ, the mean of the scores

- σ_M is the standard deviation of the sample means, ie the standard deviation of that same set of means of all the possible samples of a given size that could be taken from a population; it's equal to σ/√N

t distribution

- like z distribution but when population standard error of the mean σ_M is unknown and therefore is estimated by sample standard error of the mean s_M

- takes on different shapes with more spread-out heavier tails depending on degrees of freedom (df = number of observations minus 1), due to variability not just of sample mean M varying in each sample but also of s_M varying in each sample

- with increasing and eventually infinite df (or pragmatically df > 120 or so) t more and more closely approximates z

- describes distances of sample means M from the population mean μ in units of the estimated standard error of the mean (s_M), because t = (M - μ_M)/s_M

- μ_M is the mean of the sample means, ie the mean of the set of means of all the possible samples of a given size that could be taken from a population; it's equal to μ, the mean of the scores

- s_M is the standard deviation of the sample means as estimated from the sample itself, ie the estimated standard deviation of that same set of means of all the possible samples of a given size that could be taken from a population; it's equal to s/√N

- not limited to sample means but also used with any statistic (specifically, the statistic's distance from an hypothesized value) divided by its standard error; for instance with regression b-weights, t = (b - β)/s_b -- where β is the population value of b typically hypothesized to be 0 under the null hypothesis, and s_b is the estimated standard error of b, ie the standard deviation of all the values of b that would be obtained from every possible sample of a given size, as estimated from the sample itself

F distribution

- ratio of two chi square (χ²) values when each is divided by its df

- since χ² divided by df is the sampling distribution of the variance, F also represents the ratio of two sample variances (each estimating the same underlying population variance), which is how it's used in ANOVA

- characterized by two degrees of freedom values, for numerator and denominator df (since the numerator and denominator of the F ratio are each a variance with associated df)

- when numerator df = 1, F is equal to the square of the value of t on the denominator df

chi square (χ²) distribution

- describes possible values of the sum of a number of randomly sampled squared values from the z distribution

- degrees of freedom is the number of z scores squared and summed

- has different shapes with increasing degrees of freedom, going from positively skewed to more and more normally distributed

- mean of a given χ² distribution is its df; variance is twice its df

- typically used to evaluate "goodness of fit" of various calculable statistics, for instance with χ² statistics measuring discrepancies between observed and expected values in count data, and in contingency tables ("test of independence"); also used to evaluate model fitness with the "deviance" statistic for which a model's likelihood (L) is calculated, then changed to its logarithm (LL), switched from a negative to a positive sign (-LL), and multiplied by 2 (-2LL) to give "-2 log likelihood" that fits the χ² distribution

- usually pronounced "chi square" rather than "chi squared"; symbol is lowercase Greek letter chi χ², not uppercase Χ²

Poisson distribution

- describes the probability of a discrete event occurring a given number of times in a given time interval or spatial location, when the average rate of occurrence is a known constant and the events occur independently of each other

- not a continuous distribution like z, t, F, and χ², but discrete like the binomial, giving a probability for the event occuring 0 times, 1 time, 2 times, etc., but obviously not for it occurring 2.5 times which would be impossible

- for example, the number of meteors of a certain size hitting the Earth in a given year, or the number of customers arriving at a counter or calling in to a call center per hour, or the number of visits to an internet web site per minute, or the number of goals in a soccer game, or the number of deaths per year for a given age group, or a patient's number of psychotic episodes in a month

- the logarithm of the event's expected frequency can be modeled using various predictors in Poisson regression, where the dependent variable Y fits the Poisson distribution

- Les poissons, les poissons, how I love les poissons -- love to chop and to serve little fish. First I cut off their heads then I pull out their bones. Ah mais oui, ça c'est toujours délish! Les poissons, les poissons (hee-hee-hee, hoh-hoh-hoh), with a cleaver I hack them in two; I pull out what's inside and I serve it up fried. God, I love little fishes, don't you?

COMMONLY USED NON-PARAMETRIC (DISTRIBUTION-FREE) TESTS

Kendall's tau correlation

Spearman rank correlation

Kendall's W continuous measure of interrater agreement

Cohen's Kappa categorical measure of interrater agreement

Kolmogorov-Smirnoff one sample or two independent sample distribution comparison (analogous to independent measures t-test)

Mann-Whitney U for two samples (analogous to independent measures t-test)

Wilcoxon signed rank test for dependent / paired / matched samples (analogous to paired samples t-test)

Kruskal-Wallace one-way ANOVA using ranks

Friedman two-way ANOVA using ranks, also for repeated measures ANOVA